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(57) Abstract 



Genetic mapping is provided by either of two related general procedures. In the first procedure, mapping is provided by 
identifying genetic regions from which DNA fragments derived from two individuals can combine to form extensive hybrids free 
of base mismatches. DNA is processed by a method that allows perfectly-matched hybrid DNA molecules formed between 
DNAs from the two individuals, to be separated from imperfectly-paired DNA hybrids or hybrids in which both strands are from 
the same individual's DNA. The perfectly-matched hybrid DNAs can then be labeled and the labeled DNA used as probes to 
identify loci of indentity-by-descent between the two individuals. In the second prodecure, nicks are introduced specifically into 
DNA hybrids formed between non-identical alleles from a region of heterozygosity in an individual diploid genome. The nicked 
DNA molecules are then specifically labeled to provide probes for identifying regions of heterozygosity in the genome of an indi- 
vidual 
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GENOMIC MISMATCH SCANNING 



CROSS-REFERENCE TO GOVERNMENT GRANT 
This invention was made with Government support under 
contract HG004 50 awarded by the National Institutes of 
5 Health. The government may have certain rights in this 
invention. 

CROSS-REFERENCE TO RELATED APPLICATIONS 
This application is a continuation-in-part of 
application serial no. 880,167 filed on May 6, 1992. 

10 INTRODUCTION 
Technical Field 

The field of this invention is genetic mapping * 

Background , 

Linkage mapping of genes involved in disease 

15 susceptibility and other traits in humans, animals and 
plants has in recent years become one of the most 
important engines of progress in biology and medicine. 
The development of polymorphic DNA markers as landmarks 
for linkage mapping has been a major factor in this 

20 advance. However, current methods that rely on. these 
markers for linkage mapping in humans are laborious, 
allowing screening of only at most a few markers at a 
time. Furthermore, their power is limited by the sparsity 
of highly- informative markers in many parts of the human 

25 genome. 



WO 93/22462 PCT/US93/04160 

-2- 

To map genes whose manifestations are recognized only 
in. the whole organism, the standard approach relies on . 
identifying linkage between the trait and a genetic marker 
whose map position is already known. The most abundant 
5 and generally useful class of markers in the human genome 
are DNA sequence polymorphisms-reither restriction fragment 
length polymorphisms (RFLP * s ) or other DNA polymorphisms 
that can be detected by hybridization with specific probes 
or by amplification using specific primers. 

10 RFLP mapping and related methods have had dramatic - 

successes. However, their utility is limited by two 
problems: First, many discrete loci need- to be examined, 
but only one or a few loci can be typed at a time, which 
procedure' is arduous; second, the low density on the map 

15 and low polymorphism information content {PIC value) of 
available markers means that multiple members of each 
. family need to be typed to obtain useful linkage, 
information, even when only two share the trait of 
interest. While these disadvantages can be overcome to 

^jo^sm^^ 

well as by developing closely-spaced extremely polymorphic 
markers , the use of discrete markers for specific map 
intervals has inherent limitations . The 'limitations are 
particularly marked when applied to mapping strategies 

25 that seek to reduce the number of individuals that need to 
be analyzed in each family. 

It is generally much easier to collect many pairs of 
related individuals .who share a trait of interest . than to 
collect a few large, well -documented pedigrees in which 

3 0 the trait is segregating. Iii the former case, the. 

absolute number of individuals is smaller. For ■ medically ^ 
significant traits, affected individuals are likely to 
present themselves for examination, whereas other family 
members need to be traced and recruited. .Furthermore, 

35 individuals vital to pedigree analysis may be deceased. 
Yet the low density and low information content of 
available, markers makes the use of pedigrees almost 
mandatory for linkage mapping by RFLP analysis. 



WO 93/22462 PCT/US93/04160 

"3- 

Collection of appropriate families frequently poses the 
principal barrier to mapping genes that influence human 
traits, particularly genetically complex traits. 
Strategies have been reported for linkage mapping using 
5 the information in DNA from multiple very small sets of 
affected relatives (typically pairs or even single 
individuals) . However, these strategies depend upon the 
availability of closely-spaced highly information genetic 
markers throughout the genetic map. 
10 The issues raised above in reference to linkage 

mapping also apply to genetic risk assessment in medicine. 

In principle, any base that differs among allelic, 
sequences could serve as a marker for linkage analysis. 
Single-base differences between allelic single copy 
15 sequences from two different haploid genomes have been 
estimated to occur about once per 3 00 bp in an out bred 
-Western European population. This calculates to a total 
of about 10 7 potential markers for linkage analysis per 
haploid genome. Only a tiny fraction of these nucleotide 
20 differences contribute to mapping using current methods. 
There is, therefore, substantial interest in developing 
new methods that utilize the available genomic information 
more efficiently and can provide information concerning 
multi-gene traits. Such methods could be valuable, not 
25 only for gene mapping, but also for genetic diagnosis and 
risk assessment. 

Relevant Literature 

Articles describing the use of RFLP's are described, 
in Botstein, et al., (1980) Am. J. Hum. Genet. 32:314-331; 
Donis-Keller, et al. (1988) Cell 51:319-337; Kidd, et al „ 
(1989) Cvtoaenet. Cell, Genet, 51:622-947 and Risch (1990) 
Am. J. Hum. Genet. 46:242-253. Mapping strategies may be 
found in Risch (1990) Am. J. Hum. Genet. 46:229-241; 
Lander and Botstein (1987) Science 236:1567-1570; and 
Bishop and Williamson (1990) Am. J. Hum. Genet. 
46:254-265, Sandra and Ford, (1986) Nucleic Acids Reg. 



30 



35 
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14:7265-7282 and Casna f et al. (1986) Nucleic Acids Res. 
14:7285-7303 describe genomic analysis. 

SUMMARY OF THE INVENTION . 
Genomic analysis is achieved through the process of : 
5 Digesting DNA to be compared from two different sources,, 
usually individuals who are genetically related or 
suspected of being, genetically related, with a restriction 
enzyme that cuts relatively infrequently ? combining single 
strands of the genomic fragments from t:he two individuals - 
10 under conditions whereby heterohybrids (hybrids containing 
one strand from each individual) can be distinguished from 
homohybrids (hybrids containing both strands from the same 
individual) separating homohybrids from heterohybrids ; 
separating mismatch-free heterohybrids from hybrids with 
15 mismatches; preparing labeled probes from the mismatch- 
- free heterohybrids; and identifying regions of genetic 
identity between the two individuals by means of said 
labeled probes. The mismatch-free heterohybrids provide 

20 descent, since sufficiently large hybrid DNA molecules 

formed from non-identical regions are expected to have at 
least, one and usually many base mismatches. 

-Alternatively, one may map regions of heterozygosity, 
and, by inference, homozygosity in a single individual by 

25 isolating DNA fragments substantially free of multicopy 

DNA; melting the DNA and reannealing to provide for hybrid 
fragments,- introducing nicks specif ically in mismatched 
DNA; labeling the nicked DNA; and using the labeled DNA as 
probes to identify regions of heterozygosity. Regions of 

30 homozygosity or hemizygosity (where all or a significant 
portion of a chromosome is missing, e.g. aneuploidy) are 
inferred by the absence of hybridized label. 

DESCRIPTION OP THE SPECIFIC EMBODIMENTS 
Methods and compositions are provided for genomic 
35 mapping by identifying regions of the genome at which DNA 
sequences from two DNA sources are perfectly identical 
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over long stretches (typically 10 3 to 2 x 10 4 nt) . 
Depending upon the nature of the probe, the procedures may 
vary. In a procedure that allows for labeling of regions 
of genetic identity between two individuals r DNA sources 
5 are digested with a restriction enzyme that cuts 

infrequently; the resulting DNA is processed to isolate 
mismatch- free heterohybrid dsDNA fragments free from 
^mismatch containing or homohybrid fragments; the mismatch- 
free heterohybrid fragments are labeled and then used to 
10 identify regions of genetic identity between the two 
sources. 

Alternatively, to map regions of homozygosity or 
heterozygosity in an individual genome, one may digest the 
DNA from that individual ; optionally, remove multi-copy 
15 DNA; melt and reanneal the DNA; introduce nicks into 

mismatched dsDNA; label the mismatched dsDNA; and use the 
- labeled DNA as probes for identifying genomic regions that 
are heterozygous, leaving regions that are homozygous or 
hemizygous having substantially lower labeling. 

2 0 The DNA source may be any source, haploid to 

polyploid genomes, normally eukaryotic, and may include 
vertebrates and invertebrates, including plants and 
animals, particularly mammals, e.g. humans. The DNA will 
be of high complexity where each of the sources will 
25 usually have greater than 5 x 10 4 bp, usually greater than 
10 6 bp, more usually greater than about 10 7 bp. Thus, in 
any situation where one wishes to compare two sources of 
DNA as to their genetic similarities, whether the sources 
are related or not, the subject method may be employed. 

3 0 Usually, the sources will be related, being of the same 

species, and may be more closely related in having a 
common ancestor not further away than six, frequently 
four, generations. 

For linkage mapping or genetic diagnosis, genetically 
35 related individuals are required. Thus, the subject 

method may find application in following segregation of 
traits associated with breeding of plants and animals, the 
association of particular regions in the genomic map with 
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particular traits, especially traits associated with 
multiple genes, the transmission of traits from ancestors 
or parents to progeny, the interaction of .genes from 
different loci as related to a particular ' trait, and the.. 
5 liJce • While only two sources may be involved in the . 
comparison, a much larger sampling may be involved, such 
as 20 or more sources, where pairwise comparisons may be 
made between the various sources. Relationships between 
. the various sources may vary widely, - e.gr. grandparents and 
10 grandchildren,- siblings; cousins; and the like. ■ • 

Depending upon whether the regions of genetic 
identity or regions of non- identity are to be labeled, the 
treatment of the DNA from the source will vary.. The DMA- 
may be processed initially in accordance with conventional 
15 ways, lysing cells, removing cellular. debris, separating 
the DNA from: proteins, lipids. or other components present 
• in the mixture and then using the isolated DNA for 
restriction enzyme digestion. See Molecular Cloning, A 
Lab0ra ? 0r ^ . M ^ uaI V 2nd , ( ® d ?- .. Sa ^ 2: °?-! c et al.) CSH .. 

at least about 0.5 /ig of DNA will be employed, more 
usually at least about 5 fig of DNA, while less than 50 fig 
of DNA will usually be sufficient . 

The following procedure will address solely the - 

25 methodology employed for isolating" and labeling of DNA 

corresponding to regions of identity-by-descent between .. 
two related individuals . The total DNA from both cellular 
sources is digested completely with a restriction enzyme _ 
that cuts relatively infrequently, generally providing . ... 

30 fragments of strands of about 0.2-10xX0 4 nt, preferably of 
about 0.5-2x10* nt. The size is selected to substantially 
ensure the presence of at least one GATC sequence, and at 
least one base difference between any allelic fragments 
not identical by descent i.e. to ensure that homohybrid 

35 fragments, or heterohybrid fragments that are not 
identical by descent, Sustain at least one. cut in a. 
subsequent step.. This enzyme will normally . recognize at 
least a G -nucleotide consensus sequence "and may involve 
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either blunt-ended or staggered- ended cuts. For the 
method described in detail here, an enzyme that cuts to 
leave a protruding 3' end is needed. The protruding 3' 
ends are preferred specifically when exonuclease III 
5 digestion, followed by BNDC binding, is ultimately used to 
eliminate homohybrids and mismatched DNA's. Restriction 
enzymes yielding other sorts of ends may be preferred if 
other specific steps are substituted, as described further 
below. 

10 The resulting DNA fragments are then processed to 

provide for a means of separating complementary DNA 
hybrids, where the two strands are from different sources, 
from complementary DNA hybrids where the two strands are 
from the same source. .This may be achieved in different 

15 ways. The method exemplified in the subject invention 

uses the following steps. DNA from one of the sources is 
-methylated with a sequence specific methylase, such as Dam 
methylase or a restriction methylase, so as to 
substantially completely methylate the consensus sequences 

20 of the DNA from one of the sources . The other source is 
left unmodified or methylated completely with a different 
restriction methylase. 

The two DNA samples are then mixed, denatured and 
allowed to reanneal. A practical rate of complete * 

25 annealing of the complex DNA samples can be achieved by 
using chemical or protein catalysts under conditions that 
preserve large DNA strands (Casna, et al. (1986) Nucleic 
Acids Res. 14:7285-7303; Barr and Emanuel (1990) Anal . 
Bioch. 186:369-373; Anasino (1986) ibid 152:304-307). It 

3 0 is also necessary to avoid or minimize network formation 
resulting from rapid hybridization of non-allelic repeated 
sequences, so that simple fully-duplex products can be 
recovered even when dispersed repetitive sequences are 
embedded in them. Annealing conditions that have been 

35 shown to meet this requirement are FPERT conditions as 
described in Casna (1986) supra . 

The reannealed DNA mixture, is digested with two 
methylation- sensitive restriction endonucleases . Hybrids 
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formed between the two different DKA specimens . will be 
hemimethylated at the methylation sites . The restriction 
enzymes are selected so as to digest sites which are 
unmethylated or doubly methylated;, while being incapable 
5 cleavin 9' the hemimethylat ed sites. Desirably, the : , 

. sequence will be a relatively common sequence; generally 
occurring on the average, of about 1-lOxlO 2 , preferably 
about l-5xl0 2 . ' A. more stringent selection for mixed-donor 
hybrids can be achieved by using additional combinations 
10 of restriction/modification enzymes (Castia (1986) supra) . - 
Optionally, one may remove .unannealed DNA . (single stranded 
DNA) at this time using any convenient method, e.g. . 
adsorb t ion to BNDC. , 

Various combinations of enzymes can be employed. The 
15 combination should ensure that there is at least one cut 
in any homohybrid fragment, preferably at least two or 
-more, so the sequence should be relatively common. For ., 
example, coli dam methylase, which recognizes GATC for 
methylation, may be employed for methylation. This 

sensitive restriction endohucleases Donl and Mbol, which 
cleave at GATC sites, the former at doubly-methylated 
sites and the latter at unmethylated sites. The . 
particular sequence "GATC" is found every few hundred bp , 

25 in the human genome. While a single 

restriction/modification site is preferable,, two or three 
sites maybe involved where different combinations of 
modification and restriction enzymes are employed. With 
DNA from sources other than human, other combinations of 

3 0 modification enzymes and restriction enzymes may be * 
employed. 

to alternative procedure is as follows. By cutting 
the two DNA samples using a different restriction enzyme 
for each-specif ically two enzymes that . share a common 
35 recognition sequenpe but in one case cut with an N-base 3' 
overhang, and in the other with an N~base S r . overhang-only 
the heterohybrids will have flush ends. An example of. 
such a pair of restriction enzymes for the case N=4, is 
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Acc65I and Kpn l . The hybrids with both strands derived 
from the same DNA sample will retain either a 5' or 3' 
overhanging end. 

The flush-ended heterohybrids will uniquely be able 
5 to be ligated to a flush- ended partner, such as an 

oligonucleotide. This oligonucleotide might, for example, 
have a hairpin structure, such that the "capped" ends are 
protected from exonuclease digestion- Variations on this 
principle are possible, exploiting the distinctive 
10 structures of the ends of heterohybrids compared with 
hbmohybrids, when related restriction enzyme pairs are 
used. 

This strategy for selecting heterohybrids can replace 
the Mbo l/ Dpn l digestion step in selecting heterohybrids. 

15 However, a 5' to 3 ' exonuclease, such as bacteriophage 
lambda exonuclease, or a combination of a helicase and 
. exonuclease VII and/or I, or another combination of 
enzymes to allow digestion of all uncapped ends would need 
to be used in addition to, or in place of exonuclease III, 

20 following the MutHLS nicking step in the procedure 
outlined below. 

Other techniques may also be used for the initial 
separation of homohybrids from heterohybrids. By growing 
the cells from one of the sources with heavy isotope- 

25 labeled nucleosides, heavy atom labeled nucleosides, or 

other isotope labeled precursors, the two strands from the 
different sources will differ in density. Isotopes that 
may be used include 15 N, 13 C, 2 H, etc. The duplexes may 
then be separated by density banding. 

30 Alternatively, one may label the DNA from the two 

sources with different, labels, e.g. using labeled 
nucleotides and terminal deoxynucleotidyl transferase, by 
random conjugation, and the like. One can then separate 
all of the duplexes as to one label, and then divide that 

35 group into homo- and heterolabeled duplexes. For example, 
biotin and avidin may be used to separate at one stage, 
where the avidin is bound to magnetic beads, and 
2, 4-dinitrophenyl and anti- (2, 4 -dihitrophenyl) may then be 
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used f or the second separation, - where the anti- / : 
(2,4- dinit r ophenyl ) is bound to a support . 

Alternatively, one may select restriction enzymes 
which provide for overhangs , where heterohybrids will, 
5 result in blunt ends or overhangs that differ .from .those, 
of the homohybrids. One may then use the overhangs to 
separate homohybrids leaving the heterohybrids. 

Returning to the description of a principal., 
embodiment, the resulting mixture of uncut heterohybrids, 
10 and Mbol or Dpnl cut homohybrids is then sub j ected to a . ; 
system which allows for separation of DNA duplexes with 
some mismatched base pairs from complementary perfectly- 
matched DNA duplexes. See., for example', Lahue, etal. 
(1989) Science 245 1 160 -164; Su and Modrich (19.86) Proc. 
15 Natl. Acad. Sci . 83 ? ^7-^i . awn^ ct (1989) 

J. Biol. Chem. 264:1000-1004; Su, et al . (1989) Genome 
- 31:104-111; and Learn and Graf strom (1989) J. Bacterid. 
171:6473-6481. 

Illustrative of such a system is the "methyl -directed 

purified mutS, mutL . mutH and uvrD gene products, as well 
as exonuclease I, exonuclease VII or RecJ exonuclease, 
single strand binding (SSB) protein and DNA 
polymerase III. In vitro, seven of the. eight possible 
25 single "base mismatches , as well as small insertions and 
deletions are efficiently recognized. When DNA synthesis 
is prevented, the system specifically introduces large 
gaps in the mismatch-containing molecules. The purified 
mutS, inutL and mutH proteins act in concert to_ introduce 
30 nicks specifically into DNA molecules that contain base 

mismatches. With the exception of C-C mismatch, the. other 
mismatches are effectively identified by one or more of 
the MutX enzymes (X indicates S, L or H) . 

Exonuclease III can initiate exonuclebly tic digestion 
35 at a nick and digest the nicked strand in the 5 r to 5.* 
direction to produce a gap.. Exonuclease III can also . 
initiate digestion at a recessed or flush 3', end, but it 
cannot initiate digestion at a ■. protruding 3' end of a 
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line ar duplex. Digestion from the ends of the linear DNA 
hybrids can, therefore, be prevented by choosing a 
restriction enzyme that 'produces protruding 3' ends for 
the initial digestion of - the genomic DNA (Henikoiff ,(1984) 
5 Gene 28:351-360). The DNA ends produced by the 

restriction enzymes used to make the smaller fragments for 
distinguishing hemimethylated sites from unmethylated or 
dimethylated sites are selected to provide recessed (e.g. 
Mbol) or flush (Dpnl) 3' termini and these termini are 
10 susceptible to digestion by exonuclease III. Therefore, 
exonuclease III provides for partial single strands in 
hybrid molecules where both strands are derived from the 
same individual or where the strands contain base 
mismatches. 

15 In carrying out the process, the duplexes obtained 

from annealing and restriction digestion are exposed to 
- the methyl -directed mismatched repair system in vitro. 
The mutS , mut L and mutH introduce nicks specifically into r 
the mismatched DNA molecules, while the exonuclease III 

20 can introduce gaps at nicked sites and at recessed 3' 

termini. The mismatch- free heterohybrids, corresponding 
to regions of identity-by-descent between the sources, can 
now be distinguished from all other duplexes by virtue of 
the absence of significant gaps or partial single strand 

25 regions. 

The partially single stranded and single stranded DNA 
is now separated from the fully-duplex DNA.. This can be. 
efficiently achieved using benzoylated naphthylated DEAE 
cellulose (BNDC) . At high salt concentration, BNDC 

30 retains single stranded and partially single- stranded DNA 
molecules with high efficiency and may be separated by 
centrifugation or other separation means. The unbound DNA 
molecules are recovered, which in this case comprise 
complementary sequences from the two sources. 

35 A more complete methyl -directed mismatch repair 

enzyme system may ultimately prove to be superior in 
specificity to the simple system using only MutS, MutL and- 
MutH. For example, MutL, MutS and MutH, plus helicase II 
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(UvrD protein) , ekonucleas&'l, exonuclease VII, single- 
strand binding protein, and DNA polymerase III, acting in 
concert, can carry out mismatch dependent DNA synthesis, 
and thereby specifically introduce labeled or modified 
5 nucleotides into mismatch- containing DNA molecules. For 
example, by using biotinylated nucleotides, mismatched DNA 
molecules can be specifically biotinylated, and then 
immobilized avidin or related methods can be used to 
remove the mismatched molecules from a mixture of 

10 perfectly-matched and mismatched DNA molecules. 

Alternatively, use of the same set of enzymes, excluding 
DNA polymerase, would lead directly to gaps in the 
mismatched DNA eliminating the exonuclease III step. For 
such a procedure, the ends of the linear DNA molecules 

15 produced in the initial annealing step would need to be 
protected from the action of the exonucleases and 

v . helicase. This procedure would thus interface well with 
the end-capping method suggested above as an alternative 
method for selection for heterohybrid molecules . 

based on mismatch repair protein from EL. coli or other 
organisms could in principle substitute here for the 
particular enzyme system described. 

The mismatch- free heterohybrid dsDNA duplexes may be 

25 used without expansion, The subject method provides a 
sufficiently clean separation of the mismatch- free 
heterohybrid dsDNA duplexes in sufficient amount to allow 
for their labeling and direct use as probes. By using a 
readily available amount of DNA which can be efficiently 

30 handled, generally from about 0.5 to 100 fig DNA, usually 
about 1 to 10 fig DNA, from each source, a satisfactory 
amount of the mismatch- free heterohybrid dsDNA .duplexes 
are obtained for labeling, and probing a DNA sample. 

The mismatch- free heterohybrid dsDNA sequences from 

35 the two sources can: be used for identifying the regions of 
identity-by-descent between the two sources. 
Conveniently, the dsDNA may be. labeled for use as a probe 
to identify the corresponding genomic regions. A wide 
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variety of labels may be employed, particularly radio- 
isotopes, fluorescers, enzymes, and the like. The 
particular choice of label will depend upon the desired 
sensitivity, the nature of the genomic sample being 
5 probed, the sensitivity of detection required, and the 
like. Thus, the more complex the genome under analysis, 
the higher the sensitivity which' would be desired. 
Various instruments are available which allow for. 
detection of radioactivity, fluorescence, and the like, 

10 The probes may be prepared by any convenient 

methodology, such as nick translation, random hexamer , 
primed labeling, polymerase chain reaction using primers 
that prime outward from dispersed repetitive sequences or 
random sequences, and the like. 

15 The DNA that is probed may take a variety of forms, 

but essentially consists of a physically-ordered array of 
DNA sequences that can be related back to the physical 
arrangement of the corresponding sequences in the genome s; 
(See Boyle, et al. (1990) Genomics 7 : 127-130 ; Penkel, 

20 et al. (1988) PNAS USA 87:6634-6638) . A metaphase * 
chromosome spread is one naturally- occurring example of 
such an array. Alternatively, and preferably, a partial 
or a complete collection of cloned, amplified, or ' 
synthetic DNA sequences corresponding to known genetic 

25 locations, immobilized in an ordered array on a solid 

substrate such as a membrane or a silicon or plastic chip, 
can be used as the target for probing. 

The hybridizations with the probes are performed 
under conditions that allow the use of complex mixtures of 

3 0 DNA probes and suppress arti factual hybridization to 
repeated sequences (Boyle, et al. (1990) Genomics 
7:127-133; Pinkel, et al - (1988) Proc. Natl. Acad. Sci. 
USA 85:9138-9142; Lichter, et al . (1990) ibid ' 
. 87:6634-6638). For example, in the case of grandparent- 

35 grandchild pairs, hybridization should occur in 

approximately 25 large patches (averaging about 4 fi in 
length when prometaphase chromosomes arie used) , which in 
aggregate should cover about one-half of the genome. 
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The boundaries of t]ie patches will be determined by 
the ends of chromosomes and sites of meiotic crossing 
over. The boundaries will reflect sites of meiotic* 
recombination that occurred in meioses intervening between 
5 the two relatives. Since the areas of hybridization will > ^ 
typically be in large contiguous patches, the method is 
very robust with respect to contamination of the probe * 
with sequences representing regions that are not identical 
between the two subjects. Even if only a modest 
10 enrichment of identical -by- descent sequences is achieved, - 
the patches of identity and non-identity should be 
distinguishable as contiguous blocks of greater or lower 
signal intensity - 

Alternatively, rather than using the selected 
15 mismatch- free, heterohybrid restriction fragments as 
probes, they themselves can be immobilized on a solid 
.substrate, typically a Southern blot performed after 
resolving the DNA restriction fragments by gel 
electrophoresis. The immobilized fragments can then be 

to regions of interest. The presence or absence of 
hybridizing bands recovered from a specific pairwise 
comparison will indicate whether the pair has identity by 
descent at the locus in question. This procedure is 

25 likely to be useful for refining the resolution of a map, 
after initial mapping is achieved by using selected 
fragments as probes . 

Genetic, regions identified by this mapping method may 
be cloned by standard methods or in some cases by direct. 

3 0 cloning of the sequences selected by genomic mismatch 

scanning . Thesei sequences can then be . analyzed for their, 
biological function, and in some cases used directly or in 
synthetic or modified form for diagnostic or therapeutic 
applications. 

35 When the region of genetic identity between two or 

more individuals is sufficiently small, for example, in 
plant breeding when products of serial backcrosses with 
selection for a useful trait are compared, it may be 
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useful to clone the selected identical -by-descent 
restriction fragments, since the resulting clone pool 
would be highly enriched for the desired gene sequence 
(responsible for the selected trait) . 
5 An alternate embodiment of genetic mapping looks to 

differences within an individual genome . One may look to 
identify regions of a genetic map where an individual or a 
sample is or is not heterozygous. This method is ^ - . . 
predicated on the isolation of single-copy sequences 
10 corresponding to regions of heterozygosity in the test 

individual, based on the ability of that individual's DNA 
to give rise to hybrid DNA molecules with base mismatches 
when the individual's DNA is denatured and reannealed. 
The selected mismatched hybrid sequences are then labeled 
15 and used as probes for hybridization to a physically 

ordered array of a genomic DNA sample. Because regions of 
- the genome that lack heterozygosity are unable to produce 
single -copy DNA hybrids with base mismatches, these 
regions are visualized as gaps in the hybridization. 
20 pattern. The following is an exemplary protocol. )V 
The DNA sample is digested to completion with a 
restriction enzyme that cuts the DNA frequently enough 
that most of the resulting fragments should not contain 
repetitive sequences. Illustrative restriction enzymes 
25 include enzymes having four nucleotide consensus 

sequences, such as Alul, Rsal, TaqI, Haelll and Mbol . The 
resulting fragments will for the most part be in the range 
of about 200-4000 nucleotides. After melting or 
denaturing the DNA, the DNA is allowed to reanneal in 
30 solution, where the rapidly reannealing multi-copy or 

repeated DNA sequences (low C 0 t) are removed, for example, 
by' hydroxyapatite chromatography, based on their rapid 
reannealing, followed by allowing the remaining DNA to 
anneal completely. Removal of the low C 0 t number DNA is... 
35 desirable, but may not be essential. After complete 

reannealing, desirably removing residual unannealed DNA, 
the small, mostly* single-copy DNA fragments that remain 
are incubated with the' methyl -directed mismatch repair 
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proteins described above: and ATP to, produce nicks in ^ 
mismatched DNA duplexes . The nicks introduced^ 
specifically in mismatched molecules by the methyl- V 
directed mismatch repair allow, for nick-translation DNA 
5 synthesis by the DNA polymerase, and thus allow labeling 
with radiolabeled or other labeled nucleotide 
triphosphates. A polymerase that lacks 3 ' to 5' 
exohuclease activity, but retains 5' to 3 V exonuclease 
activity,, e.g. Tag polymerase, is preferred for this step 

10 to avoid labeling by replacement synthesis at the .ends , of - 
the fragments . ...... 

The described procedure will provide probes- that 
densely and specifically cover the regions of the genetic 
map that are heterozygous in the test individual.; 

15 Conversely, the regions that are not heterozygous will not 
provide labeled probes and so should be recognized as 
- distinctive gaps in the hybridization signal. In general, 
except where the coefficient of inbreeding is very low, 
the regions of homozygosity will be in contiguous' patches 

difference in hybridization intensity between homozygous, 
and heterozygous sites should produce discernible 
boundaries . Thus, as indicated previously, the separation 
of heterozygous from homozygous sequences need 'not be 

25 absolute. In addition to clinical and mapping. 

applications, this procedure is likely to be useful in. 
plant and animal breeding, since backcros s ing and \ 
selection can be used to isolate a gene responsible for a 
trait of interest to a small region of heterozygosity or 

30 homozygosity. 

For convenience, kits may be supplied which provide ; 
the necessary reagents in a convenient form and together. 
For example, for the genomic mismatch screening, kits 
could be provided which would include at least two of the 

35 following; The restriction enzyme^; one or more which 

provide for average fragments from the target genome of a 
size in the range of about 6.5 - 10 x 10 4 ; one or more 
modification enzymes; and restriction enzymes which 
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distinguish between hemimethylation and unmethylated or 
dimethylated consensus sequences; enzymes capable of 
introducing nicks at mismatches and expanding the nick to 
a gap of many (a 10) nucleotides; DNA polymerase; BNDC 
5 cellulose; and labeled triphosphates or labeled linkers 
for blunt end ligation or other composition for labeling v 
the sequences to provide probes . Other components such as 
a physically ordered array of immobilized DNA genomic 
clones or metaphase chromosomes, automated systems for. 

10 determining and interpreting the hybridization results , 

software for analyzing the data, or other aids may also be 
included depending upon the particular protocol which is 
to be employed. 

The subject methodology may find particular^ 

15 application in mapping genes by use of affected relative 
pairs- -that is, pairs of relatives that have a genetically 
. influenced trait of interest. "Affected relative pair" - 
methods are preferred, particularly when the penetrance of 
the allele that confers the trait is low or age -dependent, 

20 or when the trait is multigenic or quantitative, e.g. ; 

- 

height and build. Disease-susceptibility genes are 
particularly relevant . By determining where on the 
genetic map a small set, including two, of "affected", 
relatives have inherited identical sequences from a common 

25 source, and disregarding other family members, a highly 

efficient strategy for extracting linkage information from 
a pedigree is provided. The resulting identity-by-descent 
maps from multiple pairs of similarly-affected relatives 
can be combined and the composite map searched for loci 

30 where genotypic concordance between affected relatives 
occurs. more frequently than would be expected by chance. 
With a sufficiently large number of affected relative 
pairs, such an analysis can reveal the positions of genes 

^ that contribute even a slight susceptibility to the trait. 

35 The procedure may also find wide application in routine 
screening for shared genetic risks in families. 

The following examples are offered by way of 
illustration and not by way of limitation. 
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EXPERIMENTAL 

Saccharomyces c^reviseae, Baker's yeast, was used as 
a test system/ because this is genetically the best 
characterised eukaryotic organism,; and it is easy to 
5 prepare and characterize yeast clones of . defined genetic 
relatedness . ,/ 

Test of the method: Two independently isolated 
haploid clones of Saccharomyces , Y55 (HO his4 leu2 CANl s 
ura3 GAL2) and Y24 (ho HIS4 LEU2 MATa canl R URA3 GAL2 ) r 
10 both derivatives of common lab strains, were .used for the- 
experiment. We estimate from RFLP analysis that. Y55 and 
Y24 (a derivative of S288C) differ, at approximately one 
base pair per 100. The two strains were mated,, and the 
resulting diploid hybrid was sporulated, yielding 4 
15 haploid spore clones. For any given gentic locus, each 
spore clone ( "daughter") received either the Y55 or the 
. Y24 allele. The purpose of the test was to determine if 
we could specifically isolate, en masse/ DNA from all the 
loci at which two individuals (here, pairs of parent and 

from all regions where there was no identity by descent 
between the pair. We applied our genome mismatch scanning 
method to determine for several loci, which spore clones 
had identity by descent with each parent . . The results of 

25 the genome mismatch scanning analysis were compared with, 
results from conventional analysis (Table 1) , , using . 
auxotrophic and drug resistance markers . Conventional 
analysis consisted of testing for growth on appropriate 
selective media. Four loci were tested— HIS4, CAN1, . URA3, 

30 GAL2 . HIS 4 is on chromosome 3 , CAN! and I3RA3 on 

chromosome 5, and G2LL2 is on chromosome 12. Our analysis 
of these loci included a total of 15 independent Pstl 
restriction fragments r . each of which constituted an 
independent test of the genomic, mismatch scanning method, 

35 as their sometimes adjacent location in the genome was 
immaterial to their behaviour in the select ioii. The 
result of the test was that ^all 15 Pstl restriction 
fragments analyzed .were recovered if and only if they were 
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identical by descent between the parent and daughter being 
compared (Tables 1 and 2) . This result confirms the 
principles underlying genomic mismatch scanning* 
Procedure ; 

5 1. DNA Isolation; High molecular weight DNA was 

isolated from each yeast strain (parent or spore clone) by 
a standard method ( Methods in Enzvmoloav 194 
(chapter 11) : 169-182) . 

2 . Initial Restriction Enzvme Digestion: Each DNA 
10 sample was digested completely with PstI restriction 
enzyme . The DNA was recovered by phenol : chloroform 
extraction, and ethanol precipitation, and resuspended in 
Tris-HCl 10 mM EDTA 1 mM, pH 8 . 0 . 

3 . Methylation of DNA From Parental Strains with 
15 Dam Methvlaset DNA samples from the two parental strains, 
Y55 and Y24, were fully methylated with EL. coli Dam 
. methylase (New England Biolabs) , at a DNA concentration of 
0.25 mg/ml, using 4 units of enzyme//zg of DNA in an 
overnight incubation at 37°, in the buffer recommended by 
20 the manufacturer. The samples were extracted with 

phenol : chloroform, ethanol precipitated, and resuspended 
in Tris-HCl 10 mM, EDTA 1 mM, pH 8.0, . 

4 . * ■ Mixing and Solution Hybridization of Paired Test 
DNA Samples : 

25 A, Y55 DNA (5 /xg in 45 /xl) + spore clone lb DNA 

(5 /xg in 80 /xl) was denatured by adding 7.5 /xl of 5M NaOH. 
After 10 min at room temperature, the sample was 
neutralized by adding 16 /xl of 3 M MOPS acid. 32 /xl of 
formamide and 200 /xl of 2X PERT buffer (4M NaSCN 20 mM 

3 0 Tris-HCl pH 7.9, 0.2 mM EDTA) were then added, and the 

sample was adjusted to 400 /xl with water. 90% phenol in 
water was added until an emulsion was apparent (about 
8 0 /xl) , - and then the sample was agitated to maintain the 
emulsion for 12 hours at room temperature (typically about 

35 23°) . 

B. Y55 DNA (5 /zg'ih 45 /xl) + spore clone lc DNA 
(5 /xg in 45 /xl) was denatured by adding 5.4 /xl .of 5M NaOH. 
After 10 min at room temperature, the sample was 
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neutralized by adding 12 fil- of 3 M MOPS acid. 32 fil of 
. formamide and 200 fil of 2X PERT buffer were then added, 
and the sample was adjusted to 400 jxl with water. 90;% 
phenol in water was added until an emulsion was apparent 
5 (about 150 fil) , and then the sample was agitated to • 
maintain the emulsion for 12 hours at room temperature 
(typically about 23° ) . 

C. Y24 DNA (5 fig in 45 fil) + spore clone la DNA 
(5 fig in 45 fil) was denatured by A adding 5.4 /xl of 5M NaOH. 
1° * After 10 min at room temperature, . the sample was 

neutralized by adding 12 fil of 3 M MOPS acid. 32 /xl. of • 
formamide and 200 /xl of 2X PERT buffer were, then added, • 
and the. sample was adjusted . to 400 /xl with water . 90% 
phenol in water was added until an emulsion was apparent 
15 (about 150 fil) , and then the sample was agitated to 

maintain the emulsion for 12 hours at room temperature 
. (typically about 23°) . 

. D. Y24 DNA (5 fig in 45 /xl) + spore clone . 1c DNA 
(5 P-9 iii 45 /xl) was denatured by adding 5.4 fil of 5M NaOH. 

neutralized by adding 12 /xl of 3 M MOPS acid. .32 /xl of 
formamide and 200 fil of 2X PERT buffer were then added, 
and the sample was adjusted to 400 fil with water. 90% 
phenol in water was added until an emulsion was apparent 
25 (about 150 fil), and then the sample was agitated to 

maintain the emulsion for 12 hours at room temperature 
( typically about 23 ° ) . . , ' 

To recover DNA, the samples were each extracted once 
with chloroform, then ethanol precipitated, and , 
30 resuspended in 200 fil of Tris/EDTA. 

5 * Digestion of Hotriohvbrid Molecules (Both Strands 
From th e Same Source) with Dpnl* and MboX-h; 105 fil of* 
each of the homohybrid strands was digested at 37° for 2 . 
hours in a final , volume of 400 /xl of NEB buffer 3 with 100 
35 units of Dpnl and and 25 units of Mbol. .,*., ^ 

6 * 1 Removal of Residual Unannealed DNA; After 
Mbol /Dpnl digestion, samples . were extracted with ' " . 
phenol/chloroform, 100 /xl of 5M NaCl was added to each and 
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then samples were incubated- with 100 mg of BNDC cellulose 
(Sigma) equilibrated with 50 /xM tris, pH 8.0, 1M NaCl, at 
4° for 3 hours. The sample was centrifuged at 14000 rpm 
in a microfuge, then the supernatant was extracted twice 
5 . with phenol /chloroform and once with choroform, then 

ethanol precipitated, washed with 70% ethanol, dried and 
resuspended in 90 jil of Tris-HCl 10 mM EDTA 1 mM, pH 8.0. 

7 . Selective nicking of mismatched hybrid DNA's: 
15 pi of each DNA sample was mixed with 5.2 ng of MutH 

10 protein, 340 ng of MutL protein, 700 ng of MutS protein 
(all proteins provided in purified form by Paul Modrichj 
Duke University) , in a final volume of 60 fil of a buffer 
consisting of: 50 mM Hepes (pH 8.0) , 20 mM KC1, 5 mM 
MgCl 2 , 1 mM DTT, 50 fig/ml bovine serum albumin, and 2 mM 

15 ATP. The mixture was incubated at 37° for' 3 0 minutes, and 
the reaction was then stopped by heating to 65° for 10 
. minutes . 

8 . Exonuclease III Digestion to Convert Nicks into 
Sinale-Strand Gaps, and Ends from Mbol or Eton I Cleavage 

20 * into Si ngle-Strand Tails: The volume of the entire sample 
from step 7 was adjusted to 200 /*1 by adding 140 /zl of a 
buffer consisting of: 50 mM Tris-HCl (pH 8,0), 5 mM 
MgCl 2 , 10 mM /3-mercaptoethanol . Then 10 units of 
exonuclease III were added and incubation continued for 10 

25 min at 37°. This reaction was stopped by adding EDTA to 
10 mM, followed by extraction with phenol /chlorof orm. 

9. Removal of Partially or Fullv Single-Stranded - 
DNA Molecules from the Mixture: 5 0 /il of 5M NaCl + 250 /xl 
of 1M NaCl were added to adjust to a volume of 500 fxl at a 

30 concentration of 1 M NaCl. 100 mg of BNDC cellulose 

equilibrated with 50 mM tris, pH 8.0, 1M NaCl, was added 
and the mixture incubated at 4° for 3 hours.. (Sedert, 
et al. (1967) J. Mol . Biol . 26:537-540; Iyer and Rupp 
(1971) Bioch . Biophvs . Acta . 228:117-126) The mixture was 

35 centrifuged at 14000 rpm for 1 min, then the supernatant 
was extracted once with phenol/chloroform, and ethanol 
precipitated overnight . The small pellets were 
resuspended in 15 pi of Tris-HCl 10 mM, EDTA 1 mM, pH 8.0. 



WO 93/22462 PCIYUS93/04160 

-22- " " 

10 - Analysi s of the Selected DMA -pool Jbv Southern 
Blotting^ One-sixth of each of the resulting DMA samples 
were electrophoreses through a 0.7% agarose gel, in TBE 
buffer for 15 hours at 70 volts. DNA was transferred to. a 
5 nylon filter by Southern blotting, and the filter was, 
probed successively with labelled DNA f rota lambda phage 
clones corresponding to the 4 specific genetic loci, HIS4, 
CAN1 , URA3 AND GAL2. In each case, 3-5 restriction 
fragments were readily detected in the lanes corresponding 
10 to DNA samples that had identity by descent at the test 
loci/ and not in the lanes corresponding td> samples that 
were known from direct tests not to match at the locus 
being probed (see Table 1 and 2) . 



15 Table 1. 





Locus 


Strain 




CAN! 


: URA3 


HIS4 . 


GAL2 


parents^; 


Y24 




. A 




A 




T Y55 


". 5 ""b :; """" 


B 


B 


" B 














daughters 


la 


-B- . 


B 


A 


A 




. lb 


A 


- . B 


A 


; A 




1c 


A 


A 


B. 


\ b •;. 



The two alleles at each locus are designated A and B, 
20 respectively for the alleles present in Y24 and Y55. Each 
spore clone inherits an allele from one of its two 
parents, either the A allele from Y24 or the B allele from 
Y55. The alleles at these loci can be distinguished 
directly by testing for growth in specific media. 
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Table 2. Summary of the results, of the Genome Mismatch. 
Scanning test. 







LOCUS 


Comparison # 


Relative Pair 


CAN1 


URA3 


HIS4 


<3AIi2 


-1 


Y2 4 /daughter la 






+ 


+ 


2 


Y55 /daughter 1c 








+ , 


3 


Y24 /daughter 1c 


+ ■ 


+ 






4 


Y55 /daughter lb 




+ 


















number of restriction fragments 


t4] 


[4] 


[3] 


[5] 



10 11 indicates no DNA was recovered for any of the 

restriction fragment bands detected by the DNA probe 
specific for the indicated locus (neglecting faint bands 
from cross-hybridization to unlinked sequences). 

" + " indicates recovery of DNA in all restriction bands - 
15 ' detected by the DNA probe specific to the indicated locus. 

The number of restriction fragment bands surveyed by the 
probe used for the indicated locus is indicated ih 
brackets in the bottom row of the table. The probes used 
in each case were bacteriophage lambda clones from the 

20 ordered collection established by Maynard Olsen. The - v. 
specific clones used as probes were: CAN1 : clone 5917, 
HIS4: clone 4711, URA3 : clone 6150, GAL 2 : clone 6637 . The 5 
clone numbers are the numbers assigned by Maynard Olsen. 
For convenience, only 4 of the eight possible parent 

25 daughter combinations were tested. The results of the 

genetic tests for 4 loci are shown in Table 1. Numerous 
other pairwise comparisons have subsequently been tested 
with similar results . 



3 0 It is evident from the above results, that the 

subject methodology provides for numerous advantages. The 
methods provide access to a large set of highly 
polymorphic markers required for linkage mapping with 
small family units . A great increase in the effective 

35 number of informative markers is achieved without a 

corresponding increase in the number of individual tests, 
since all the markers are screened in parallel in a single 
procedure. By allowing much. smaller sets of related 
individuals to be used for linkage mapping, the affected- 

40 relative-pair and homozygosity-by-descent mapping methods 
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can greatly reduce the cost- and labor involved in 
developing the human genetic map. Genomic mismatch / ; 
scanning allows for . the practical application of linkage 
mapping to genetically heterogeneous or quantitative 
5 traits, such as cardiovascular disease , asthma, 

psychiatric disorders, epilepsy, obesity, cancer and 
diabetes. 

The subject methodology does hot rely on any 
previously-mapped genetic markers. Thus, one can use the 
10 subject methodology to begin immediately to develop the 

genetic/and physical maps of a genome for which little or 
no prior map information is available . This, can find 
particularly important application in the breeding of 
plant or animal species, as well, as in development of the 
15 genetics of such species. 

Each pair-wise analysis allows sites of meiotic 
- recombination to be mapped. In grandparent -grandchild 
pairs, identity-by-descent maps specifically identify the 
sites of meiotic recombination in the corresponding 

genetic and physical map, locations of sites of enhanced 
or diminished recombination , effects of age, sex and other 
factors on the frequency and distribution of meiotic 
recombination events, and the relationship between 

25 recombination and non-dys junction can be readily : 
investigated in this way. 

Finally, the ability to detect directly regions of 
the genome that have lost heterozygosity may be useful in 
identifying putative tumor* suppressor geneis and in the 

3 0 earlier diagnosis of malignancies, since loss of 

heterozygosity at specific loci appears to be an important 
genetic event in the development of many cancers. 

All publications and patent applications cited in 
this specification are herein" incorporated by reference as 

35 if each individual publication or patent application were 
specifically -and individually indicated to be incorporated 
by reference. 
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Although the foregoing invention has been described 
in some detail by way of illustration and example for 
purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of 
5 the teachings of this invention that certain changes and 
modifications may be made thereto without departing from 
the spirit or scope of the appended claims . 
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WHAT IS CLAIMED IS: ^ ... ... 

1, A method for separating DNA duplexes capable of 
being used for genetic mapping or identification, from a 
complex mixture of DNA from. two related sources, wherein 
5 each, of said sources contributes at least about 5 x 10 4 bp 
of DNA, said method comprising: 

digesting the. DNA from first and second related 
sources to provide restriction fragments, wherein said DNA 
from said first and second related sources is~ 
10 distinguishable as. a result of differential modification 
of said DNA or use of different restriction enzymes, in. 
said digesting, to provide DNA duplexes consisting of 
homohybrids and heterohybrids; 

separating homohybrids from heterohybrids ; 
15 introducing lesions in heterohybrids having 

- mismatches; and 

isolating heterohybrids having lesions .from perfectly 
complementary heterohybrids. 

2* A method according to Claim 1, wherein said 
20 different modification and separating is by means of: 

methylating the DNA from one of said related sources 
or methylating the DNA from both of said sources with 
different methyl ases ; 

cleaving said DNA duplexes with a methyl sensitive 
25 . restriction enzyme resulting in cleaving, of homohybrid - 
DNA; and 

segregating heterohybrid DNA from cleaved homohybrid 
DNA; : 

and said introducing lesions is by ineans of: 
30 bringing together said heterohybrid DNA with enzymes 

of the methyl -directed mismatch repair pathway and an 
exonuclease, whereby nicks are introduced into said 
heterohybrid DNA comprising a lesion and said lesion is - 
extended into a gap by said exonuclease to provide - 
35 partially single stranded and single stranded DNA; 
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dividing said partially single -stranded and single 
stranded DNA from said perfectly complementary DNA 
duplexes; and 

labeling said perfectly complementary DNA duplexes 
5 for use in genetic mapping or identification. 

3. - A method according to Claim 2, wherein said 
enzymes comprise MutL, MutS, and MutH of E. coli. 

4 . A method according to Claim 2, wherein included * 
in said combining step is helicase II; single-strand 
10 binding protein, and at least one of exonuclease -I and 
VII; or wherein said exonuclease is exonuclease III of 
E. coll . 

5. A method according to- Claim 2, wherein said 
-dividing comprises: 
15 combining said partially single-stranded and single 

stranded DNA and DNA duplexes with benzoylated 
naphthylated DEAE cellulose (BNDC) at high salt 
concentration; and 

freeing the DNA bound to the BNDC cellulose from the 
20 perfectly complementary DNA duplexes, 

6 . A method for separating DNA duplexes capable of 
being used for genetic mapping or identification, from a 
complex mixture Of DNA from two related genomes, wherein 
each of said genomes contributes at least about 10 6 bp of 

25 DNA, said method comprising: 

combining DNA restriction fragments from first and 
second related genomes, wherein said DNA substantially 
consists of restriction fragments comprising a GATC 
sequence, under melting and reahnealing conditions to form 

3 0 homohybrid and heterohybrid DNA duplexes, wherein said DNA 
fragments from said first and second sources are 
different; 

segregating homohybrid from heterohybrid DNA duplexes 
by means of the difference in said DNA duplexes; 
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bringing together heterohybrid DNA duplexes with , 
enzymes consisting of MutL, ; MutS, and MutH of £. coll, 
resulting in lesions consisting of nicks in DNA duplexes 
that contain mismatches; 
5 treating the resulting mixture of nicked and unnicked 

DNA molecules with exonuclease .III, .. such that nicked DNA 
molecules in the mixture are rendered partially single 
stranded; and 

separating said: partially single stranded DNA from 
10 completely double stranded DNA. 

7. A method according to Claim 6 r wherein said 
method comprises one of : 

including in said bringing together, or in a 
subsequent step, a DNA polymerase, and labeled 
15 nucleotides, wherein said labeled nucleotides become 
. incorporated into said nicked DNA duplexes; and 

separating said labeled DNA from unlabeled DNA by . 
means of said label; or 

stranded DNA and completely double stranded DNA with, 
benzoylated naphthylated DEAE cellulose at high salt 
concentration; and 

separating the DNA bound' to the cellulose from the 
completely double stranded DNA; or . 

cleaving partially single stranded DNA at. the site of 
said single strand to provide, small DNA duplexes; and , 

separating said small DNA duplexes from uncleaved DNA 
duplexes. _ 



8~ A method according to Claim 6, ' wherein said. DNA . 
restriction fragments are different in having different 
termini, wherein the difference in termini is used to 
separate heterohybr ids , from hornohybr ids . . 



25 
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9 . A method according to Claim 6 , wherein said 
bringing together further comprises : * 

including with said heterohybrid DNA, helicase II, 
exonuclease I and/or VII , single strand binding protein, 
5 DNA polymerase III, and labeled nucleotides, wherein said 
labeled nucleotides become incorporated into partially 
single stranded DNA to provide labeled DNA; and 

separating said labeled DNA from unlabeled DNA by 
means of said label.. 

10 10. A method for identifying nucleic acid areas of 

identity with a probe obtained by a separating method for 
separating DNA duplexes from a complex mixture of DNA from 
two related sources, wherein each of said sources 
contributes at least about 5 x 10 4 bp of DNA, said method 
15 comprising: 

digesting the DNA from first and second related 
sources to provide restriction fragments, wherein said DNA 
from said first and second related sources is 
. distinguishable as a result of differential modification 
20 of said DNA or use of different restriction enzymes in 
said digesting, to provide DNA duplexes consisting of 
homohybrids and heterohybrids; 

separating homohybrids from heterohybrids; 
introducing lesions in heterohybrids having 
25 mismatches; and 

isolating heterohybrids having lesions from perfectly 
complementary heterohybrids; and 

labeling said perfectly complementary heterohybrids; 
said method comprising: 
30 combining said probe with an ordered array of DNA 

molecules representing at least a portion of a genetic map 
under conditions wherein said probe hybridizes to 
homologous DNA; and v 

detecting the areas of identity by means of said 
35 label. 
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11* A method according, to Claim 10, wherein said 
ordered array is a metaphase chromosome spread. 

12. A method according, to Claim 10,. wherein said 
ordered array is an array of clones . representing at least 

5 a portion of said genetic map. 

13. A method according to Claim 10 , wherein- said 
ordered array is an array of DNA sequences amplified 

in vitro, representing at least a portion of said genetic* * 
'» map.- ^ . . " " t 

10 14. A method for identifying DNA sequences which - are 

heterozygous for the same locus in the genome . of an 
individual , said method comprising: 

digesting genomic DNA with at least one restriction 
. enzyme to provide small DNA fragments of from about; 200 to 
15 5000 bp; 

melting and reannealing said DNA fragments ; 

methyl -directed mismatch repair pathway, whereby nicks, are 
introduced into the mismatch- containing fragments of said 
20 DNA; *.'"""• 

labeling said nicked DNA fragments to provide probes; 
.combining said probes with an ordered array of DNA - ; 
molecules representing at least a portion of said genome j 
under conditions wherein said probe hybridizes to - 
25 homologous DNA; and 

identifying said DNA sequences by means of said 
label. 

15. A method ac cording, to Claim 14, wherein said .... 
label comprises incubating said lesion containing small 
3 0 DNA fragments with 3l polymerase which lacks 3 ' to 5' 

exonuclease activity and labeled nucleotide triphosphates. 
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16. A method for identifying DNA sequences which are 
heterozygous for the same locus in the genome of an 
individual, said method comprising: 

digesting genomic DNA with at least one restriction 
5 enzyme to provide small DNA fragments of from about 200 to 
5000 bp; 

. melting and reannealing said DNA fragments; ' 
combining said DNA fragments with enzymes of the 
nethyl- directed mismatch repair pathway, whereby nicks are 
10 introduced into the mismatch containing fragments of said * 
DNA; 

treating said DNA fragments with an exonuclease such 
that those fragments with nicks are rendered single- 
stranded or partially single -stranded; 
15 separating single-stranded and partially single- ' 

stranded fragments from completely double -stranded DNA 
.fragments by adsorbtion to BNDC; 

labeling said single stranded or partially single- y>; 
stranded small DNA fragments or labeling perfectly 
20 complementary dsDNA to provide probes; 

combining said probes with an ordered array of DNA fr 
molecules representing at least a portion of said genome : r 
under conditions wherein said probe hybridizes to 
homologous DNA; and 
25 identifying said DNA sequences by means of said 

label. 

17. A method according to Claim 16, wherein said 
labeling of said lesion containing partially single 
stranded DNA. comprises adding DNA polymerase III and 

30 nucleotide triphosphates including a label to said 
partially single stranded DNA. 

18. A method of genetic mapping comprising: 
combining fragmented first genomic DNA from two 

different sources under conditions wherein said fragments 
3 5 can anneal together to form heterohybrid dsDNA duplexes 
and homohybrid dsDNA duplexes, wherein the size of the 
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fragments is selected to substantially ensure that 
heterohybrids formed from genetic regions which are not 
identical by descent will contain at least one base 
mismatch and to ensure the ability to separate perfectly 
5 matched duplexes from mismatched duplexes; * 
separating heterohybrid duplexes from homohybrid 
duplexes and matched duplexes from mismatched duplexes by * 
preferentially modifying heterohybrid duplexes and 
mismatched homohybrid duplexes ; 

10 labeling at least one of. the strands . of said matched * 

heterohybrid duplexes to provide detectable probes without 
expansion of said matched heterohybrid duplexes; and 

combining second genomic DNA with said detectable 
probes to detect and/or map sites of matched DNA of said 

15. sources. 

" - } ■ ■ " 

.19, A method according to Claim 18, wherein said 

method further comprises: 
^ ^ at least one of said sources - ^ 

20 for the DNA from each source ; and 
said separating comprises : 
-• cleaving said duplexes with at least one restriction 
endonuclease , wherein duplexes which are iinmethylated or 
doubly methylated are cleaved to provide smaller 
25 fragments ; 

isolating the uncleaved. duplexes free of said cleaved 
duplexes ; 

nicking mismatched duplexes and introducing gaps in 
said nicked mismatched duplexes while leaving matched 
3 0 duplexes unchanged; and - 

segregating gapped duplexes from matched duplexes. 



20. A method according to Claim 19, wherein said 
nicking and introducing gaps employs the proteins of a 
methyl directed mismatch repair pathway.. 
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21. A method according to Claim 19, wherein said 
nicking and introducing gaps employs the proteins of the 
methyl -directed mismatch repair pathway and exonuclease 
III. 

5 22. A method according to Claim 19, wherein said 

segregating comprises combining said duplexes with BNDC 
cellulose and separating DNA bound to said BNDC cellulose 
from unbound duplexes. 

23. A method according to Claim 18, wherein said 
10 second genomic DNA is an ordered array of DNA molecules 

representing at least a portion of a genetic map. 

24. A kit comprising MutL, MutS, MutH enzymes, 
labeled triphosphates or labeled linkers, and a DNA 

- polymerase. 

15 25. A kit according to Claim 24, further comprising 

at least one of the following: a modification enzyme, and 
an enzyme that cleaves at other than hemimethylated sites 
recognized by said modification enzyme; an exonuclease; 
helicase II; single -strand binding protein; BNDC 

20 cellulose; an ordered array of DNA molecules comprising at 
least a portion of a genetic map. 
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